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FOREWORD 


This  Technical  Memorandum  presents  a  report  for  reference 
during  the  process  of  validating  the  UK  MOVES  and  US  MOVES 
models .  Validation  is  a  process  in  which  an  inference  is 
accepted  that  a  simulator  is  a  correct  (valid)  representation 
of  a  real-world  situation.  If  a  model  has  not  been  recently 
validated,  it  should  not  be  used  since  the  decision  maker 
could  not  depend  on  its  results.  The  driving  force  in  the 
use  of  MOVES  while  experimenting  with  alternative  solutions 
to  problems,  was  that  it  be  validated  prior  to  running  each 
particular  set  of  experiments.  The  squadron/detachment  was 
simulated  and  validated  in  their  current  state  to  guarantee 
to  the  users  of  MOVES  that  the  model  could  accurately  predict. 
Experiments  were  then  run  with  a  high  degree  of  confidence 
put  on  the  predictions.  . 

The  MOVES  model  is  a  computerized  simulation  model,  and 
has  been  heavily  used  by  the  US  Marine  Corps,  and  UK  Royal 
Navy  in  "try-before-buy"  exercises,  to  simulate  AV-8A  Harrier 
and  Sea  Harrier  operations,  maintenance  and  supply  support. 

By  simulating  via  use  of  the  validated  MOVES  model,  problems 
and  mistakes  have  been  avoided,  saving  time,  money  and  the 
inconvenience  of  living  with  expensive  and  undesirable 
situations  that  could  have  resulted  from  untried  real-world 
actions . 

This  Statistical  Validation  report  can  be  used  in  validating 
other  simulation  models,  but  should  not  be  viewed  as  a 
comprehensive  statistical  validation  text.  This  report 
illustrates  those  items  that  have  been  satisfactorily  used 
and  discussed  with  the  U.S.  and  U.K.  users  of  MOVES. 
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summary 


The  problem  under  study  was  how  to  convince  the  Harrier 
aircraft  decision  makers,  e.g.,  squadron  operator,  NAVAIR  and 
DOD  management,  that  the  MOVES  model  was  worth  the  trouble  and 
cost  of  using  it.  Many  times  the  question  posed  to  NAVWESA  was, 
"After  we  describe  operating  logic  and  data  to  you,  how  do  we 
know  all  our  effort  was  worth  it,  i.e.,  how  do  we  know  the  model 
is  any  good?"  This  concern  was  genuine,  since  the  elements  and 
data  addressed  by  MOVES  are  many  in  number  and  broad  in  scope, 
and  many  people  have  spent  a  lot  of  time  and  effort  into  the 
construction  of  simulation  models  that  proved  worthless.  The 
approach  to  answer  the  question  was  by  proving  to  these  decision 
makers  that  MOVES  could  accurately  predict  the  consequences  of 
experimenting  on  simulated  squadron  activities,  thus  avoiding 
the  time  and  trouble  of  experimenting  on  the  actual  real-world 
process  itself.  This  proof  was  established  by  use  of  the  methods 
presented  in  this  validation  plan  during  the  initial  validation 
of  MOVES  in  1975.  That  is,  in  1975,  flight  operations,  mainten¬ 
ance  and  supply  support  of  Squadron  VMA-542  were  simulated  using 
the  MOVES  model.  The  output  of  MOVES  for  3  consecutive  months 
agreed  (was  valid)  with  the  real-world  output  of  the  Squadron. 
Since  then,  the  model  has  been  continuously  validated  against 
each  squdaron  prior  to  its  use  fo t  experimentation  on  that 
squadron ,  again  using  the  techniques  presented  in  this 
validation  plan.. 

The  MOVES  model,  a  computerized  simulation  of  Harrier 
squadron  flight  operations,  maintenance  and  supply  support, 
was  conceived,  designed  and  developed  by  NAVWESA  for  PMA-257 
(NAVAIR  AV  Project  Manager) .  Its  use  has  been  input  to 
changes  in  policy,  including  stretching  of  intervals  between 
phased  maintenance,  use  of  the  team  maintenance  concept,  and 
cannibalization.  In  1978,  the  British  Government,  under 
contract  with  the  U.S.  Government,  adopted  MOVES  for  use 
as  their. Royal  Navy  Sea  Harrier  and  Sea  King  Helicopter 
simulation  model,  via  support  by  NAVWESA.  MOVES  is  a 
highly  effective  tool  for  use  in  preventing  problems  vice 
curing  them,  in  the  "try-before-buy "  mode  exercised  by  the 
U.S.  and  U.K. 

As  a  result  of  the  use  of  this  report  to  validate  the 
MOVES  model,  users  in  the  U.s.  and  U.K.  exhibit  a  high 
acceptance  of  MOVES.  This  report  should  be  used  as  reference 
when  validating  MOVES  if  MOVES  were  tailored  for  non-AV 
aircraft,  or  for  validating  other  simulation  models. 
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PURPOSE 


The  purpose  of  this  report  is  to  provide  a  reference  for  use 
when  validating  the  UK  MOVES  model  and  future  US  MOVES  models. 

PROBLEM 

In  simulation  work,  the  most  important  and  difficult 
problem  to  solve  is  that  the  representation  (model)  of  the 
real-world  must  be  legitimate  (valid)  and  to  the  satisfaction 
of  the  model  user(s).  It  then  behooves  the  simulation  expert 
to  establish  for  the  user,  a  high  level  of  confidence  in  the 
predictive  powers  of  the  model  and  its  valid  use  for  experi¬ 
mentation.  This  problem  was  kept  on  the  surface  during  the 
design,  development  and  implementation  of  the  US  MOVES  model. 

To  solve  the  problem,  high  levels  of  confidence  were 
established  in  the  US  MOVES  AV-8a  model  by  continual  use 
of  the  methods  described  in  this  ^Statistical  Validation  Plan. 

That  is,  prior  to  each  AV-8A  experiment,  MOVES  was  validated 
for  that  particular  squadron/detachment,  operating  in  their 
current  mode.  Then  experiments  were  run,  and  exhaustive 
analyses  performed  on  the  interaction  of  critical  elements 
within  the  model,  as  well  as  on  the  predictions  output  by 
MOVES. 

VALIDATION 

Validation  is  the  process  of  building  an  acceptable  level 
of  confidence  that  an  inference  about  a  simulated  process  is  a 
correct  (valid)  inference  for  the  real-world  process.  The 
validation  of  the  MOVES  model  is  considered  complete  when  an 
agreement  (objective  or  subjective)  is  reached  between  the 
behavior  of  the  simulation  (MOVES  model)  and  a  real-world 
system)  the  model  is  then  valid  for  use.  The  simulator  (MOVES) 
is  defined  as  a  symbolic  or  numerical  abstraction  of  the 
real-world  process  under  study,  and  is  not  the  real-world 
process  itself.  The  goal  of  simulation  using  the  MOVES 
simulator  is  to  learn  something  about  the  anticipated 
real-world  process  vice  experimenting  on  the  real-world 
process  itself. 

Since  simulations  tend  to  become  far  more  complex  than 
other  Management  Science  models,  validation  is  necessary  to 
instill  an  acceptable  level  of  confidence  when  simulating 
a  real-world  process.  Simulators  allow  the  decision  maker 
to  include  many  different  parts  and  processes  in  one  model, 
and  allow  them  to  interact  in  non-linear,  non-steady-state 
modes .  The  worth  of  a  simulation  model  with  untested, 
untestable,  or  refuted  assumptions  is  questionable. 


Seldom  will  validation  result  in  an  absolute  proof  that 
the  simulator  is  a  "true"  model  of  the  real  process.  The 
simulator  produces  some  output  from  each  run}  this  output  then 
needs  validation.  Validation  should  be  done  under  various 
conditions  of  land  and  sea  basings.  Many  approaches  to  the 
proof  of  validation  exist,  each  of  which  may  increase  or 
decrease  the  confidence  in  the  simulation  output.  These 
approaches  include: 

a.  Various  statistical  tests  on  simulator  output  versus 
real-world  performance,  including  confidence  intervals, 
tests  of  means  and  variances,  analysis  of  variance, 
regression,  factor  analysis,  spectral  analysis  and 
auto-correlation,  chi-square,  and  non-parametric  tests. 

See  Appendices  A,B,C,D  for  statistical  discussions. 

b.  Special  data  collection  efforts  in  which  data  n<">t 
normally  collected  are  obtained  from  the  real-world  for 
comparison  with  data  output  at  various  steps  within  the 
simulator.  This  important  area  of  validation  should  be 
well  analyzed  prior  to  application,  since  it  involves 
additional  data  collection. 

c.  Field  tests  in  which  a  process  is  placed  in  an 
operational  situation  and  performance  is  measured  prior 
to  actual  implementation.  This  process  is  then  monitored 
in  actual  operation,  comparing  results  with  the  MOVES 
Internal  output  such  as  queue  congestion. 

d.  Complementary  research  to  determine  "why"  situations 
occur,  and  possible  ways  to  improve  the  situation. 

The  US  MOVES  AV-8a  model  was  validated  initially  in  1975, 
and  was  continually  validated  before  each  series  of  simulation 
experiments.  The  US  MOVES  AV-8B  and  UK  MOVES  models,  although 
not  validated  as  of  this  date,  have  been  run  using  best  estimates 
of  input  data  and  logic  describing  the  expected  real-world.  As 
data  and  information  concerning  squadron  operations  evolve,  both 
MOVES  models  should  be  run  in  conjunction  with  operating  the 
Harriers  on  land  or  sea.  These  two  (MOVES  and  squadron  operations) 
will  complement  each  other,  resulting  in  shorter  solutions  to 
operating  problems.  Pre-validation  runs  of  the  MOVES  models 
should  be  made  in  various  configurations  of  land  or  sea  basings, 
along  with  analyses  as  described  in  a,b,c,  and  d  above.  This 
will  increase  confidence  in  the  models'  predictive  powers  during 
and  after  validation,  enabling  decision  makers  to  gain  insight 
into  solutions  of  potential  problems. 
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Validation  implies  use  of  the  model.  The  “users"  of  the 
model  are  land  and  sea  squadron  managers  and  operators,  along 
with  others  concerned  with  policies  or  other  decisions  regarding 
the  operation  of  the  Harrier.  If  these  "users"  are  not  motivated 
to  use  the  model,  the  model  is  less  than  useful;  it  becomes  an 
academic  exercise. 


BACKGROUND 
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The  validation  process  of  the  US  MOVES  model  initially 
took  place  in  August  1975,  using  VMA-542  AV-8A  squadron 
operating  at  MCAS  Cherry  Point,  North  Carolina.  Because  MOVES 
output  h’ad  been  continually  validated,  those  charged  with  making 
decisions  about  the  AV-8A  used  it  to  develop  policy  in  the 
peacetime  and  wartime  areas  of  phase-maintenance,  cannibalization, 
flying  programs,  on-deck  maintenance,  and  maintenance  work-center 
configuration.  The  users  of  the  AV-8A  US  MOVES  model  bear  witness 
to  a  high  degree  of  confidence  in'  the  predictive  powers  of  the 
model.  This  validated  model  is  presently  under  transition  from 
the  AV-8A  configuration  to  the  advanced  AV-8B  configuration. 


As  a  result  of  the  confidence  generated  by  the  US  MOVES 
validation  and  use,  in  1978  the  BRN  contacted  the  US  Navy  to 
have  NAVWESA  modify  the  US  MOVES  model  to  represent  the  British 
environment  in  which  the  Royal  Navy  Sea  Harriers  would  operate 
aboard  ship  in  conjunction  with  the  Sea  King  helicopters.  This 
work  resulted  in  the  creation  of  the  UK  MOVES  model.  Many 
experiments  have  been  run  on  this  UK  model,  assisting  the  BRN 
in  the  composition  of  wartime/peacetime  flying  scenarios ,  along 
with  policies  involving  maintenance,  supply  support,  and  the 
interaction  between  the  Sea  Harrier  and  Sea  King. 

The  MOVES  Model 

MOVES  is  a  computerized  simulation  of  the  operations, 
maintenance,  supply  and  logistics  support  of  Harrier  aircraft 
squadrons/detachments .  The  MOVES  model  predicts  the  affect 
that  various  combinations  of  flying  programs,  maintenance 
programs,  supply  and  logistic  support  have  on  the  capability 
of  the  squadron/detachment  to  meet  its  mission.  Some  elements 
that  the  model  predicts  as  a  result  of  each  simulation  include; 
operational  readiness,  flying  hour  program  achievement,  sorties 
flown  and  lost,  non-operational-ready  due  to  supply  and 
maintenance,  downtime  due  to  scheduled  and  unscheduled 
maintenance,  direct-maintenance-manhours  summary,  cannibalization 
summary,  supply  profile  showing  each  part  ordered  from  supply, 
number  of  times  the  part  was  in  supply  and  number  of  times  the 
part  had  to  be  cannibalized.  In  addition,  MOVES  lists  a  complete 
profile  of  maintenance  work  center  congestion  along  with  utili¬ 
sation  of  support  equipment.  Two  versions  of  MOVES  exist; 
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1)  the  US  MOVES  supporting  the  US  Marine  Coj’-ds  in  their  planning 
for  operating  the  AV-8A  Harriers,  and  design  and  operation  of 
the  AV-8B  Harriers;  2)  the  UK  MOVES  supporting  the  Bri'ish  Royal 
Navy  (BRN)  Sea  Harrier  operations.  Both  models  were  conceived, 
designed,  and  developed  by  the  Program  Analysis  Department 
(ESA-3)  of  the  Naval  Weapons  Engineering  Support  Activity  (NWESA) 

The  shelves  are  full  of  computerized,  often  costly, 
simulation  models  not  in  use  due  to  reasons  including:  the 
inability  to  validate;  the  inability  to  obtain  useful  input 
data;  poor  and  untimely  communication  between  model  developers 
and  model  users;  inappropriate  level  of  detail;  too  much 
generalization;  model  designers  unfamiliar  with  the  real-world 
being  modeled;  inappropriate  output  statistics  for  use  in 
decision  making;  and  the  length  of  time  for  design/development/ 
validation,  precluding  timely  input  of  model  results  in  decision 
making.  The  MOVES  model  is  not  one  of  these  models. 

CONCLUSIONS  AND  RECOMMENDATIONS 

The  use  of  the  methods  presented  in  this  report  is 
valuable  to  the  process  of  validation  of  the  MOVES  model,  a 
large  and  complex  simulation  model-.  Through  this  comprehensive 
validation  via  use  of  these  methods,  decision  makers  feel  a 
high  degree  of  confidence  with  the  predictive  results  of  the 
MOVES  model.  The  results  of  applying  these  scientific  methods 
are  input  to  the  human  decision  process  in  proving  comprehensive 
validation,  leading  to  extensive  use  of  the  simulation  model. 

The  methods  presented  in  this  report  should  be  used  as  a 
reference  when  validating  other  simulation  models;  large  or 
small,  complex  or  simple. 


Appendix  A 

Sampling  Distributions 

A  distribution  which  shows  the  probabilities  of  obtaining 
different  values  of  x  (sample  mean)  is  called  a  theoretical 
sampling  distribution  of  x. 

Two  important  theorems  are  related  to  theoretical  sampling 
distributions : 

1)  If  random  samples  of  size  n  are  taken  from  a 
population  which  has  the  mean  y  and  standard  deviation 
O,  the  theoretical  sampling  distribution  of  x  has  the 
mean  y  and  the  standard  deviation  cr  /  ( n)*s. 

This  theorem  states  that  the  mean  of  the  sampling 
distribution  of  x  equals  the  population  mean.  Also, 

0/{n)h  (called  the  standard  error  of  the  mean)  is  the 
standard  deviation  of  the  theoretical  sampling 
distribution . 

The  standard  error  of  the  mean  plays  an  important  role 
in  inductive  statistics,  because  it  measures  the 
variation  of  the  theoretical  sampling  distribution  of 

x.  in  other  words,  it  tells  how  much  the  sample  means 
can  be  expected  to  vary  from  sample  to  sample.  The 
standard  error  of  the  mean  decreases  as  n  increases. 

That  is,  when  n  is  large,  a  sample  mean  will  yield  a 
more  reliable  estimate  of  y  than  when  n  is  small. 

2)  If  n  is  large  (•>  30)  and  is  composed  of  the  sum  of 
independent  identically  distributed  random  variables, 
the  theoretical  sampling  distribution  of  x  can  be 
approximated  very  closely  with  a  normal  curve,  regardless 
of  the  shape  of  the  population  distribution.  This  is 
called  the  Central  Limit  theorem,  and  permits  the 
calculations  of  the  probabilities  of  obtaining  various 
values  of  x.  Note  that  this  theorem  contains  no 
specification  about  the  shape  of  the  distribution,  of 

the  population. 

Utilizing  these  2  theorems,  it  is  possible  to  construct 
a  statistical  confidence  interval  about  the  mean,  using 
the  sampling  mean  x,  and  a.  Although  x  estimates  the  mean 

y,  the  value  of  the  population  standard  deviation,  o  ,  is 
not  known.  Therefore,  a  must  be  replaced  with  an  estimate, 
i.e.,  with  the  sample  standard  deviation  s: 


A- 1 


c  where  E  =  summation. 

Sample  standard  deviations  can  be  expected  to  provide 
close  estimates  of  a  when  n  30.  In  computing  confidence 
intervals  around  the  mean,  for  this  case  (n  >_  30)  it  is 
necessary  to  refer  to  a  table  of  Normal  Curve  Areas;  for 
n  <  30,  it  is  necessary  to  refer  to  a  table  of  "t"  values. 
In  dealing  with  "t"  values,  it  is  assumed  that  the  sample 
is  random  and  the  population  can  be  approximated  closely 
with  a  normal  curve. 


Appendix  B 


Student  t-test,  F-test,  and 
Confidence  Intervals 

Student  t-test 

An  important  part  of  the  validation  process  is  to  compare 
the  outputs  of  the  real  system  with  those  generated  by  its 
model  representation.  A  method  of  performing  this  comparison 
is  by  using  the  Student  t-test,  a  statistical  tool  generally 
applied  when  the  number  of  data  points  available  is  less  than 
30.  The  Student  t-test  distribution  on  which  the  t-test  is 
based,  is  peaked  in  the  center  and  has  higher  tails  than  does 
the  normal  distribution.  Specifically,  two  procedures  using 
the  t  distribution  can  be  utilized  to  1)  compare  the  means  of 
2  samples,  and  2)  construct  coi>fidence  intervals  about  those 
mean  values.  These  two  procedures  are  now  discussed  in  greater 
detail . 

Procedure  1: 

Assume  that  2  populations  with  true  but  unknown  means 
and  P2  and  variances  O j 2  and  a 22  respectively  are  the  subject 
of  analysis.  Then  use  of  the  t-t.est  requires  the  following 
2  assumptions:  1)  the  observations  made  from  each  population 
are  random,  independent,  and  drawn  from  an  approximately  normal 
population;  2)  the  variances  of  the  2  populations  are  equal, 
that  is  Oj2  *  o22. 

Each  set  of  observations  drawn  from  the  population  is 
called  a  sample.  If  2  samples,  1  from  each  population  are 
taken,  and  the  mean  values  of  the  samples  are  mt  ,  m2 
respectively,  then  the  first  procedure  seeks  to  answer  the 
following  question:  Is  the  difference  m;  -  m2  attributable  to 
a  population  difference  ,  or  may  it  be  random  variation 

from  a  single  population  mean? 

Procedure  2 : 

Since  the  sample  mean  mj  (m2)  is  only  an  approximation 
of  the  true  mean  (y2)  of  the  first  (second)  population,  the 
second  procedure  is  designed  to  answer  this  question:  In  the 
long  run,  between  what  2  values  will  the  true  (but  unknown) 
mean  lie  a  certain  percent  of  the  time?  The  phrase  "a  certain 
percent  of  the  time"  is  defined  in  terms  of  a  number  between 
0  and  1.0,  and  is  called  the  confidence  level.  A  confidence 
level  is  selected,  the  sample  is  drawn,  then  the  lower  and 
upper  values  (called  confidence  limits)  which  could  include 
the  unknown  mean  will  be  calculated.  This  interval  between 
the  upper  and  lower  confidence  limits  is  known  as  a  confidence 
interval.  If  the  sampling  is  repeated  indefinitely,  with  each 
sample . leading  to  a  new  confidence  interval  (a  new  interval 
estimate),  then  in  95%  of  the  samples,  the  interval  will  cover 
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the  true  population  mean.  if  one  makes  a  practice  of  sampling , 
and  if  for  each  sample  one  states  that  the  true  mean  lies 
within  the  confidence  interval,  95%  of  these  statements  will 
be  correct;  5%  will  be  wrong. 


The  comparison  of  means  technique  is  illustrated  as 
follows.  Let  an  AV-8A/B  Harrier  squadron  operating  in  a  ship 
or  land  based  environment  be  the  real  system,  and  let  the 
USMOVES  simulation  model  be  the  computer  representation  of 
that  squadron.  Assume  that  squadron  operations  have  reached 
a  near-stable  state  and  that  the  quantity  of  sorties  flown 
per  week  is  the  subject  of  validation.  In  terms  of  the  above 
description,  squadron  summary  reports  represent  observations 
of  the  real  system,  and  the  replication  of  simulation  experi¬ 
ments  represents  observations  of  the  simulation  model.  During 
the  time  period  being  studied,  the  question  is  the  following: 
Does  the  simulation  model  adequately  represent  squadron 
operations  in  terms  of  sorties  flown? 

Suppose  that  a  one  week  period  of  squadron  operations  is 
simulated  and  that  this  experiment  is  replicated  with  different 
random  seeds  4  more  times  giving  a  sample  of  size  5.  Repli¬ 
cations  in  this  case  are  random  independent  observations  from 
the  population  of  all  replications  for  this  experiment.  This 
sample  of  size  5  is  to  be  compared  with  a  sample  of  8  squadron 
summary  reports,  each  summarizing  a  period  of  1  week.  Note 
that  consecutive  1  week  summaries  are  not  necessarily 
independent  since  there  are  cases  where  the  operations  of  1 
week  heavily  influence  those  of  the  following  week.  Although 
a  sample  of  completely  independent  summaries  is  nearly 
impossible  to  select,  this  problem  should  still  be  considered. 
Unless  evidence  to  the  contrary  is  available,  both  populations 
are  assumed  to  be  normal.  The  assumption  of  equal  variances, 
if  in  doubt  however,  can  be  statistically  checked  as  will  be 
demonstrated  further.  Suppose  now  that  the  following  2 
samplings  (1  from  the  squadron  and  1  from  the  model)  repre¬ 
senting  sorties  flown  per  week  were  obtained: 

Sampling  1  Squadron  Summary  Sampling  2  Model  Output 


48 

53 

51 

60 

55 

57 

53 

53 

TOTAL  430 


56 

63 

54 

58 

56 

TOTAL  287 
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Now  calculate  the  mean  of  each  sample  by  computing  mean 
■  sum  of  sorties  in  sample  1 
size  of  sample  i 

mean  sample  1  «=  430/8  =  53.75  =  X_; 

mean  sample  2  *  287/5  «  57.4  =  X2 

The  Student_t-tes t  can  now  be  applied  to  examine  the 
difference  X!  -  X2.  If  the  difference  is  small  enough,  this 

will  indicate  that  both  means  are  considered  to  be  estimates 

of  the  same  population  mean,  and  therefore  the  simulation 
model  seems  to  adequately  represent  the  flying  program  of  the 
squadron.  If  the  difference  is  not  small  enough,  the 
indication  is  that  the  model  and  squadron  means  represent 
different  populations;  if  so,  modification  of  model  logic, 
or  additional  data  analysis  (or  both)  may  be  necessary.  The 
t-test  is  performed  by  calculating  the  statistic: 

**  where  T 2  are 


sample  means,  m  and  n2  the  sample  sizes  of  samples  1  and  2 
respectively,  and  Ex2  is  the  pooled  sum  of  squares  for  the  2  - 
samples,  which  is  caluclated  by  subtracting  the  sample  mean 
from  the  observed  sample  value,  squaring  the  result  and 
summing  over  all  observations. 

EX2  «  (48-53. 75)2  +  (53-53. 75)2  +  ••  +  (53-53. 75)2  + 

(56-57.  Af  -  (63-57. 4)2  +  ••  (56-57. 4)2 

»  93.5  +  47.18  =  140.7 

setting  ni*=8,n2«=  5,  t  **  (53.75  -  57.4)  P  8(5)  (8+5  -  2)  pi 

(8+5)  (140.7) 


This  t-value  is  now  compared  to  a  value  obtained  from 
page  B-6  of  values  for  the  t-distr ibution  for  various 
confidence  levels.  Assume  a  confidence  level  of  95%.  The 
table  value  obtained  for  13-2  “  11  degrees  of  freedom  is 
2.201  (2  is  subtracted  from  the  total  of  13  since  2  mean 
values  were  calculated).  Since  the  calculated  t-value  1.79 
is  less  than  the  table  value  of  2.201,  the  observed  sortie 
average  difference  is  not  significant;  i.e.,  the  model  sortie 
output  average  adequately  represents  the  squadron  sortie 
output  average.  If  the  calculated  t-value  exceeded  2.201 
then  model  logic/data  could  have  required  more  analysis. 


(X;  - 


n  in  2(n 
(n  i+n 2 


1  l+n  2-2) 

ITTx5 
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F-test 


If  the  assumption  of  equal  population  variances  is  in 
doubt,  another  statistical  test  called  the  F-test  may  be 
used.  The  confidence  level  concept  forms  the  basis  of  the 
F-test  just  as  it  did  in  the  t-test.  In  general  the  F-test 
compares  the  variances  of  samples  drawn  from  2  populations 
and  if  the  ratio  6f  the  larger  variance  divided  by  the  smaller 
exceeds  a  table  value  chosen  at  a  specified  confidence  level, 
then  the  hypothesis  that  the  population  variances  are  equal 
would  be  rejected.  To  apply  the  test  to  the  2  samples  listed 
above,  calculate  the  sample  variances  as  follows: 

variance ^  «  pooled  sum  squares  sample  i 

(size  sample  i)  -  1. 


variance  j 


variance  2 


((48  -  53. 75)2  +  (53  -  53. 75)2  +  ..  + 

% 

(53-53. 75J2  ) /  (8-1) 

93.5  /  7 
13.36 

((56-57.  At  +  (63-57. 4J2  +  ..  (56-57. 4)2  )/ 
(5-1) 

47.18  /  4 
11.80 


The  computed  F-value  is  F  =*  variance  , 

variance  2 

«  13.36  /  11.80 

-  1.13 

The  table  value  as  shown  on  page  B-7  for  F  assuming  a 
confidence  levelof  95%  for  8-1  ■  7  and  5-1  «  4  degrees  of 
freedom  respectively  is  6.09.  Since  the  calculated  F-value 
is  less  than  the  table  value,  based  on  the  sample  data  listed 
above,  the  hypothesis  that  populations  have  equal  variances 
cannot  be  rejected.  If  the  calculated  value  had  exceeded  the 
table  valuer  then  application  of  the  t-test  previously  would 
have  been  questionable,  and  other  statistical  tools/analyses 
would  be  required. 


Confidence  Intervals 


The  most  important  aspect  involved  in  the  application  of 
confidence  intervals  is  not  in  the  statistics  of  the  exercise, 
but  in  the  agreement  between  the  population  and  the  sample. 
Claims  that  confidence  interval  estimates  apply  to  the 
population  that  was  actually  sampled,  rest  on  the  judgement 
of  the  investigator.  Careful  investigators  take  pains  to 
describe  any  relevant  characteristics  of  their  data  in  order 
that  others  can  envisage  the  nature  of  the  sampled  population. 

Consider  a  confidence  interval,  at  a  probability 
(confidence)  level  of  95%.  Before  the  random  sample  is  drawn, 
the  confidence  statement  is,  "the  probability  is  95%  that  the 
interval  to  be  constructed  would  include  the  true  (but  unknown) 
population  mean".  After  the  sample  is  drawn,  the  confidence 
statment  is  true  or  false;  it  is  incorrect  to  say  "the 
probability  is  95%  that  the  population  mean  lies  between  the 
lower  and  upper  value  of  the  sample  just  taken".  In  a  specific 
application,  we  do  not  know  if  our  confidence  statement  is 
actually  one  of  the  95%  correct,  or  one  of  the  5%  that  are 
wrong. 


If  the  sampling  is  repeated  indefinitely  with  each  sample 
leading  to  a  new  confidence  interval  (a  new  interval  estimate) , 
then  in  95%  of  the  samples,  the  interval  will  cover  the  true 
population  average.  If  one  makes  a  practice  of  sampling,  and 
if  for  each  sample,  states  that  the  true  mean  lies  within  the 
confidence  interval,  95%  of  the  statements  will  be  correct. 

The  uncertainty  comes  from  the  sampling  process.  Each 
sample  specifies  an  interval  estimate.  Whether  or  not  each 
interval  happens  to  include  the  true  population  mean  is  a  risk 
(probability) .  This  probability  is  not  of  the  true  mean  lying 
in  the  interval,  because  the  true  mean  is  unknown  and  fixed; 
thus  cannot  have  a  distribution.  The  risk  is  the  probability 
of  the  interval,  (the  random  variable)  containing  the  true 
value.  The  confidence  statement  concerns  the  population  mean? 
it  does  not  concern  the  mean  in  other  samples  to  be  drawn. 
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Table  B.  Table  or  Cmtical  Values  or  t 


I  • 


• 

Level  of  significance  for  one-tniled  test 

Iff 

JO 

.05 

.025 

.01 

mm 

.0005 

Level  of  significance  for  two-tullcd  test 

■ 

.20 

.10 

.05 

.02 

■  Tl  B 

.001 

1 

3.07S 

6.314 

12.700 

31 .821 

topi 

636.619 

3 

1.SS5 

2.920 

6.965 

31. SOS 

3 

1.638 

2.353 

3.182 

4.541 

12.941 

4 

1.533 

2.132 

2.776 

3.747 

4. 604 

8.G10 

£ 

1.476 

2.015 

2.571 

3.365 

4.032 

6.859 

« 

1.440 

1.943 

2.447 

3.143 

3.707 

5.959 

7 

1.415 

1.S95 

2.365 

*  2.99S 

3.499 

5.405 

8 

1.397 

1.EG0 

2.300 

2.896 

3.355 

5.041 

• 

1.3S3 

1.833 

2. 202 

2.821 

3.250 

<.781 

10 

1.372 

1.S12 

2.228 

2.764 

3.169 

4.587 

11 

1.363 

1.796 

2.201 

2.718 

3.106 

4.437 

12 

1.356 

1.782 

2.179 

2.CS1 

3.055 

4.318 

13 

1.350 

1.771 

2.100 

2.650 

3.012 

4.221 

24 

1.345 

1.761 

2.145 

'  2.624 

2.977 

4.140 

IS 

1.341 

1.753 

2.131 

2.002 

2.947 

4.073 

16 

1.337 

1.746 

2.120 

2.5S3 

2.921 

4.015 

17 

1.333 

1.740 

2.110 

2.507 

2.E98 

3.9C5 

IS 

1.330 

1.734 

2.101 

2.552 

2. 878 

3.922 

19 

1.328 

1.729 

1  2.093 

2.539 

2.861 

3.ES3 

20 

1.325 

1.725 

2.0S6 

2.528 

2.845 

3.850 

21 

1.323 

1.721 

2.0S0 

2.518 

2.831 

3.819 

.  » 

1.321 

1.717 

2.074 

2.  SOS 

2  819 

3.792 

23 

1.319 

1.714 

.2.069 

2.500 

2.807 

3.767 

-  24 

1.31S 

1.711 

2.064 

2.492 

2.797 

3.745 

.  25 

1.316 

1.708 

2.060 

2.485 

2.787 

3.725 

26 

1.315 

1.706 

2.056 

2.479 

•2.779 

3.707 

27 

1.314 

1.703 

2.052 

2.473’ 

2.771 

3.690 

2S 

1.313 

1.701 

2. CHS 

•2.467 

2.763 

3.674 

29 

1.311 

1.099 

2.045 

2.462 

2.756 

3.659 

30 

1.310 

1.697 

2.042 

•2.457 

2.750 

3.646 

40 

1.303 

1.6S4 

2.021 

2.423 

2.704 

3.551 

1.296 

1.071 

2.000 

2.390 

2. 660 

3.460 

120 

1.2S9 

1.658 

1.980 

2.358 

2.617 

m 

1.282 

1.045 

1.960 

2.326 

2.576 

3.291 

*  T«l*le  B  is  abridged  from  Table  III  of  Fisher  and  Yates:  Statistical  idles  far 
biological,  agricultural,  end  tntdical  research,  published  by  Oliver  and  Boyd  Ltd., 
Edinburgh,  by  permission  of  the  authors  and  publishers. 
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Appendix  C 

Wilcoxon-Mann-Whitney  Test 


Wilcoxon-Mann-Whitney  Test 

A  most  useful  alternative  to  the  t-test  is  the 
non-paranetric  Wilcoxon-Mann-Whitney  test.  This  can  be  used 
to  test  if  the  average  of  two  sets  of  independent  samples 
differ..  The  procedure  used  is  to: 

a.  Choose  a  significance  level  of  the  test.  This  is 
called  ct,  and  is  equal  to  1.00  -  Confidence  level. 

b.  Combine  the  observations  from  the  two  samples  and 

rank  them  in  order  of  increasing  size  from  smallest  to 

largest.  Assign  1  to  smallest,  2  to  next  smallest  etc. 

In  case  of  ties,  average  the  rankings. 

% 

c.  Let  nj  =  smaller  sample  size,  n2  =  larger  sample 
size  and  N  =  total  sample  size. 

d.  Compute  Rj  =  sum  of  ranks  for  the  smaller  sample. 

If  samples  sizes  are  equal,  use  sum  of  ranks  for  either 
sample. 

e.  Compute  R2  *  n,  (N  +  1)  -  Rj. 

f.  Find  value  in  table  on  page  C-3  or  C-4  (Critical  R  Values) 
of  r  for  ®  and  (n]rn2) ,  If  either  Rx  or  R2  is  smaller 

than,  or  equal  to,  the  table  R  for  (n,,n2),  the  averages 
of  the  two  samples  differ.  Otherwise,  there  is  no  reason 
to  believe  that  the  averages  differ. 


For  example: 

a.  ct  **  1 . 00- .90= .  10  ,  for  2-sided  test 


Group 

A 

Group 

B 

50.5 

(9) 

57.0 

(17) 

37.5 

(1) 

52.0 

(ID 

49.8 

(7) 

51.0 

(10) 

56.0 

(15. 

5) 

44.2 

(3) 

42.0 

(2) 

55.0 

(14) 

56.0 

(15. 

5) 

62.0 

(19) 

50.0 

(8) 

59.0 

(18) 

54.0 

(13) 

45.2 

(5) 

48.0 

(6) 

53.5 

(12) 

44.4 

(4) 

Numbers  in  parentheses  indicate  rankings. 


c. 

ni 

-  9, 

n2  *» 

10, 

N  »  19 

d. 

Ri 

-  77 

e . 

r2 

-  9 

(20)  - 

77 

=  103 

f . 

R 

for  . 

10  and  (9 

,  10)  =  69. 

Since 

neither  Rj  or  R2 

are 

smaller 

than 

69, 

there  is  no 

reason 

to  believe  that 

the  averages  of  the  two  groups  differ. 


R ( RANK)  VALUES 

CRITICAL  VALUES  OF  SMALLER  RANK  SUM  FOR  THE  WILCOXON-MANN-WHITNEY  TEST 
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For  larger  values  of  n,  and  n,,  critical  values  are  given  to  a  good  approximation  by  the  formula: 
|(«.  +  n,  +  1)  -  j_a ± Dl» 

where  2  ■*  1.28  for  a  —  .20  (two-sided  test) 

2  -  1.64  for  a  -  .10  " 

2  -  1.96  for  a  -  .05  ” 

2  «  2.58  for  <r  »  .01  " 

I  to  mn mtolm  tai  V^P^mtrU  Storied  SWUto  by  M.  W.  T.u  and  R.  C.  CWU»d,  CopyiUht.  J*W.  Inter*.*  Printer,  ^  f^bSkh 


Appendix  D 


Determining  if  observed  data  fit  a  Normal  Curve. 


There  are  a  number  of  methods  by  which  one  can  relatively 
quickly  determine  if  observed  data  are  normally  distributed. 
This  Appendix  discusses  three  methods. 


1)  If  the  cumulative  "less  than"  percentage  distribution 
of  a  set  of  data  which  fits  closely  to  a  Normal  Curve  is 
plotted  on  probability  paper,  the  points  will  lie  on  a 
straight  line  (or  reasonably  close  to  a  straight  line) .  For 
example,  consider  the  following  data  distribution  of  wearout 
times  of  a  particular  mechanical  assembly: 


Class  Limits; 
Wearout  Times 


Frequency 


15.0 

- 

15.8 

9 

15.9 

- 

16.7 

24 

16.8 

- 

17.6 

51 

17.7 

- 

18.5 

66 

18.6 

- 

19.4 

72 

19.5 

- 

20.3 

48 

20.4 

- 

21.2 

21 

21.3 

- 

22.1 

6 

22.2 

- 

23.0 

3 

300 

Converting  this  information  into  a  cumulative  percentage 
distribution  yields: 


Class  boundaries; 
Wearout  Times 


Cumulative 

Percentage 


less 

than 

14  , 

,95 

less 

than 

15, 

.85 

less 

than 

16  , 

.75 

less 

than 

17, 

.65 

less 

than 

18  , 

,55 

less 

than 

19  , 

.45 

less 

than 

20, 

.35 

less 

than 

21 , 

.25 

less 

than 

22  , 

.15 

less 

than 

23, 

.05 

plot  these  points 
on  probability 
paper  as  shown  on 
page  D-2 


It  seems  reasonable  to  state  that  on  the  basis  of  the  plot 
shown  on  page  D-2,  the  points  lie  very  close  to  a  straight 
line;  hence  the  original  distribution  of  data  can  be 
approximated  very  closely  with  a  Normal  Curve.  Note  that 


D-l 


Cumulative  Probab 


the  first  and  last  'less  than"  boundaries  were  not  plotted, 
since  0  or  100  percent  of  the  Normal  Curve  can  never  be 
reached . 

A  disadvantage  of  this  method  is  that  one  must  decide 
subjectively  whether  the  points  fall  "reasonably  close"  to  a 
straight  line.  This  subjectivity  can  be  eliminated  by  using 
a  more  precise  method  called  the  "Goodness  of  Fit"  test  to 
the  data.  This  test  is  the  Chi-Square  (x2)  test  and 
statistically  compares  the  frequencies  from  the  original 
distribution  to  the  expected  frequencies  from  a  theoretical 
normal  curve.  The  explanation  of  the  X2  test  is  not  presented 
herein.  The  X  Application  in  this  case  can  be  calculated  by 
the  interested  reader  to  determine  if  the  observed  distribu¬ 
tion  constitutes  a  sample  from  a  population  having  a  Normal 
Distribution . 

* 

2)  Another  method  of  identifying  a  possible  Normal 
Distribution,  is  if  the  mean=median=mode  (or  if  they  are 
reasonably  close) ,  then  the  observed  distribution  closely 
follows  a  Normal  Distribution  pattern.  The  mean  is  defined 
as  the  arithmetic  mean.  The  median  is  the  value  that 
corresponds  to  the  point  which  divides  the  distribution 
into  two  equal  parts.  The  mode  is  the  value  which  occurs 
most  often. 

3)  A  third  method  is  to  compute  the  mean  and  standard 
deviation  of  a  set  of  representative  data.  If  much  more  or 
much  less  (admittedly  subjective)  than  5%  of  the  distribution 
lies  outside  the  limits  of  mean  +_  2  times  standard  deviation, 
the  data  probably  follow  a  distribution  other  than  the  Normal. 


Appendix  E 


Assumptions 

An  Assumption  as  related  to  the  operation  of  the  U.K. 

MOVES  model  is  defined  as  one  of  the  multifarious  elements 
that  is  not  explicitly  addressed  throughout  each  simulation. 

All  other  elements  are  explicitly  addressed  in  the  model  and 
are  defined  as  rules.  This  differentiation  can  be  considered 
arbitrary;  it  is  a  convenient  way  of  grouping  these  items, 
and  is  submitted  with  this  plan  for  information  purposes. 

The  list  of  assumptions  include: 

1.  The  methodology  contained  within  the  model  of  the 
real-world  (including  input  data  and  output  calculations) 
adequately  describes  the  real-world  situation,  and  this 
methodology  remains  intact  when  translated  into  the 
computer  program  (computerized) . 

2.  The  real-world  situation  being  simulated  is  in  a 
steady-state  condition. 

3.  The  form  of  the  input  data  has  been  mathematically 
described,  and  is  consistent  with  its  use  in  the  model. 

4.  Data  at  the  component  level  are  consistent  with  the 
same  data  at  the  parent  system  level. 

5.  Tractors,  tow-arms,  mechanical  handlers  and  lifts 
never  fail  to  operate  (100%  reliable) . 

6.  The  Sea  King  is  100%  reliable. 

7.  Combat  and  non-combat  damage,  personnel  casualties 
(combat  and  non-combat),  weather,  fueling  and  re-fueling, 
fires,  do  not  affect  the  operation  of  the  aircraft  or 
squadron . 

8.  Skill  level  within  the  supervisory  and  non-supervisory 
ranks  remain  constant  throughout  each  simulation. 

9.  Time  spent  on  breaks  from  work  (tea,  lunch,  etc.) 
when  on-shift,  is  included  in  repair  time. 

10.  Men  are  in  proper  location  when  needed.  The  lift  is 
always  in  proper  position  when  needed. 


11.  When  a  part  is  required  and  is  in  stock,  the  part 
is  obtained  instantaneously. 

12.  Time-to-next-f ailure  and  supply-delay- times  follow 
an  exponential  distribution;  maintenance-times  are  the 
average  repair  times.  (in  the  U.S.  MOVES  model, 
log-normal  repair  times  are  used.) 


